AITopics

Country:

Europe > Austria > Vienna (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Neural Information Processing SystemsFeb-16-2026, 12:32:34 GMT

NanoBaseLib: A Multi-Task Benchmark Dataset for Nanopore Sequencing

This highlights an urgent need of contributions from the machine learning community.

artificial intelligence, bioinformatics, machine learning, (20 more...)

Country:

Europe > Finland (0.05)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Overview (0.48)
Workflow (0.47)
Research Report (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Neural Information Processing SystemsOct-11-2025, 00:30:58 GMT

8bce223b376f52fb86a148097eebb10d-Paper-Datasets_and_Benchmarks_Track.pdf

dataset, nanopore, sequence, (16 more...)

Country:

Europe > Finland (0.05)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Workflow (0.47)
Research Report (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Neural Information Processing SystemsOct-10-2025, 12:42:10 GMT

a8ea503d91320fcfe12cba61c8a6d285-Paper-Datasets_and_Benchmarks_Track.pdf

language model, prediction, sequence, (14 more...)

Country:

Europe > Austria > Vienna (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

arXiv.org Artificial IntelligenceMay-21-2025

OmniGenBench: A Modular Platform for Reproducible Genomic Foundation Models Benchmarking

Yang, Heng, Cole, Jack, Li, Yuan, Chen, Renzhi, Min, Geyong, Li, Ke

The code of nature, embedded in DNA and RNA genomes since the origin of life, holds immense potential to impact both humans and ecosystems through genome modeling. Genomic Foundation Models (GFMs) have emerged as a transformative approach to decoding the genome. As GFMs scale up and reshape the landscape of AI-driven genomics, the field faces an urgent need for rigorous and reproducible evaluation. We present OmniGenBench, a modular benchmarking platform designed to unify the data, model, benchmarking, and interpretability layers across GFMs. OmniGenBench enables standardized, one-command evaluation of any GFM across five benchmark suites, with seamless integration of over 31 open-source models. Through automated pipelines and community-extensible features, the platform addresses critical reproducibility challenges, including data transparency, model interoperability, benchmark fragmentation, and black-box interpretability. OmniGenBench aims to serve as foundational infrastructure for reproducible genomic AI research, accelerating trustworthy discovery and collaborative innovation in the era of genome-scale modeling.

bioinformatics, large language model, machine learning, (24 more...)

2505.14402

Country:

North America > United States (0.67)
Europe (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Promising Solution (0.87)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
(4 more...)

arXiv.org Artificial IntelligenceApr-28-2025

Crypto-ncRNA: Non-coding RNA (ncRNA) Based Encryption Algorithm

Wang, Xu, Wang, Yiquan, Huang, Tin-yeh

A BSTRACT In the looming post-quantum era, traditional cryptographic systems are increasingly vulnerable to quantum computing attacks that can compromise their mathematical foundations. To address this critical challenge, we propose crypto-ncRNA--a bio-convergent cryptographic framework that leverages the dynamic folding properties of non-coding RNA (ncRNA) to generate high-entropy, quantum-resistant keys and produce unpredictable ciphertexts. The framework employs a novel, multi-stage process: encoding plaintext into RNA sequences, predicting and manipulating RNA secondary structures using advanced algorithms, and deriving cryptographic keys through the intrinsic physical unclonability of RNA molecules. Experimental evaluations indicate that, although cryptoncRNA's encryption speed is marginally lower than that of AES, it significantly outperforms RSA in terms of efficiency and scalability while achieving a 100% pass rate on the NIST SP 800-22 randomness tests. These results demonstrate that crypto-ncRNA offers a promising and robust approach for securing digital infrastructures against the evolving threats posed by quantum computing. Moreover, with the rapid advancement of artificial intelligence, RNA-based research has gradually unfolded into a new realm of innovation (Townshend et al. (2021)). Recent studies showed that the dynamic folding processes of RNA molecules intrinsically exhibit physical unclonable functions (PUFs) characteristics (Herder et al. (2014); Li et al. (2022); Luescher et al. (2024); Zhou et al. (2021)), thereby establishing a pathway for designing post-quantum cryptography (PQC) systems (Arapinis et al. (2021); Cambou et al. (2021)).

ai4na workshop, artificial intelligence, data length, (15 more...)

2504.17878

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (0.54)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Banking & Finance > Trading (0.86)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence (1.00)

Klypa, Roman, Bietti, Alberto, Grudinin, Sergei

BAnG: Bidirectional Anchored Generation for Conditional RNA Design

arXiv.org Artificial IntelligenceFeb-28-2025

Designing RNA molecules that interact with specific proteins is a critical challenge in experimental and computational biology. Existing computational approaches require a substantial amount of experimentally determined RNA sequences for each specific protein or a detailed knowledge of RNA structure, restricting their utility in practice. To address this limitation, we develop RNA-BAnG, a deep learning-based model designed to generate RNA sequences for protein interactions without these requirements. Central to our approach is a novel generative method, Bidirectional Anchored Generation (BAnG), which leverages the observation that protein-binding RNA sequences often contain functional binding motifs embedded within broader sequence contexts. We first validate our method on generic synthetic tasks involving similar localized motifs to those appearing in RNAs, demonstrating its benefits over existing generative approaches. We then evaluate our model on biological sequences, showing its effectiveness for conditional RNA sequence design given a binding protein.

bidirectional anchored generation, rna sequence, sequence, (14 more...)

2502.21274

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-11-2025

NatureLM: Deciphering the Language of Nature for Scientific Discovery

Xia, Yingce, Jin, Peiran, Xie, Shufang, He, Liang, Cao, Chuan, Luo, Renqian, Liu, Guoqing, Wang, Yue, Liu, Zequn, Chen, Yuan-Jyue, Guo, Zekun, Bai, Yeqi, Deng, Pan, Min, Yaosen, Lu, Ziheng, Hao, Hongxia, Yang, Han, Li, Jielan, Liu, Chang, Zhang, Jia, Zhu, Jianwei, Wu, Kehan, Zhang, Wei, Gao, Kaiyuan, Pei, Qizhi, Wang, Qian, Liu, Xixian, Li, Yanting, Zhu, Houtian, Lu, Yeqing, Ma, Mingqian, Wang, Zun, Xie, Tian, Maziarz, Krzysztof, Segler, Marwin, Yang, Zhao, Chen, Zilong, Shi, Yu, Zheng, Shuxin, Wu, Lijun, Hu, Chen, Dai, Peggy, Liu, Tie-Yan, Liu, Haiguang, Qin, Tao

Foundation models have revolutionized natural language processing and artificial intelligence, significantly enhancing how machines comprehend and generate human languages. Inspired by the success of these foundation models, researchers have developed foundation models for individual scientific domains, including small molecules, materials, proteins, DNA, and RNA. However, these models are typically trained in isolation, lacking the ability to integrate across different scientific domains. Recognizing that entities within these domains can all be represented as sequences, which together form the "language of nature", we introduce Nature Language Model (briefly, NatureLM), a sequence-based science foundation model designed for scientific discovery. Pre-trained with data from multiple scientific domains, NatureLM offers a unified, versatile model that enables various applications including: (i) generating and optimizing small molecules, proteins, RNA, and materials using text instructions; (ii) cross-domain generation/design, such as protein-to-molecule and protein-to-RNA generation; and (iii) achieving state-of-the-art performance in tasks like SMILES-to-IUPAC translation and retrosynthesis on USPTO-50k. NatureLM offers a promising generalist approach for various scientific tasks, including drug discovery (hit generation/optimization, ADMET optimization, synthesis), novel material design, and the development of therapeutic proteins or nucleotides. We have developed NatureLM models in different sizes (1 billion, 8 billion, and 46.7 billion parameters) and observed a clear improvement in performance as the model size increases.

large language model, machine learning, natural language, (20 more...)

2502.07527

Country:

Europe > Austria > Vienna (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > District of Columbia > Washington (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.84)

arXiv.org Artificial IntelligenceDec-26-2024

Biology Instructions: A Dataset and Benchmark for Multi-Omics Sequence Understanding Capability of Large Language Models

He, Haonan, Ren, Yuchen, Tang, Yining, Xu, Ziyang, Li, Junxian, Yang, Minghao, Zhang, Di, Yuan, Dong, Chen, Tao, Zhang, Shufei, Li, Yuqiang, Dong, Nanqing, Ouyang, Wanli, Zhou, Dongzhan, Ye, Peng

Large language models have already demonstrated their formidable capabilities in general domains, ushering in a revolutionary transformation. However, exploring and exploiting the extensive knowledge of these models to comprehend multi-omics biology remains underexplored. To fill this research gap, we first introduce Biology-Instructions, the first large-scale multi-omics biological sequences-related instruction-tuning dataset including DNA, RNA, proteins, and multi-molecules, designed to bridge the gap between large language models (LLMs) and complex biological sequences-related tasks. This dataset can enhance the versatility of LLMs by integrating diverse biological sequenced-based prediction tasks with advanced reasoning capabilities, while maintaining conversational fluency. Additionally, we reveal significant performance limitations in even state-of-the-art LLMs on biological sequence-related multi-omics tasks without specialized pre-training and instruction-tuning. We further develop a strong baseline called ChatMultiOmics with a novel three-stage training pipeline, demonstrating the powerful ability to understand biology by using Biology-Instructions. Biology-Instructions and ChatMultiOmics are publicly available and crucial resources for enabling more effective integration of LLMs with multi-omics sequence analysis.

large language model, machine learning, natural language, (18 more...)

2412.19191

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Health & Medicine > Therapeutic Area (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jeon, Nicholas, Qian, Xiaoning, SaidyKhan, Lamin, de Figueiredo, Paul, Yoon, Byung-Jun

LoRA-BERT: a Natural Language Processing Model for Robust and Accurate Prediction of long non-coding RNAs

arXiv.org Artificial IntelligenceNov-11-2024

Long non-coding RNAs (lncRNAs) serve as crucial regulators in numerous biological processes. Although they share sequence similarities with messenger RNAs (mRNAs), lncRNAs perform entirely different roles, providing new avenues for biological research. The emergence of next-generation sequencing technologies has greatly advanced the detection and identification of lncRNA transcripts and deep learning-based approaches have been introduced to classify long non-coding RNAs (lncRNAs). These advanced methods have significantly enhanced the efficiency of identifying lncRNAs. However, many of these methods are devoid of robustness and accuracy due to the extended length of the sequences involved. To tackle this issue, we have introduced a novel pre-trained bidirectional encoder representation called LoRA-BERT. LoRA-BERT is designed to capture the importance of nucleotide-level information during sequence classification, leading to more robust and satisfactory outcomes. In a comprehensive comparison with commonly used sequence prediction tools, we have demonstrated that LoRA-BERT outperforms them in terms of accuracy and efficiency. Our results indicate that, when utilizing the transformer model, LoRA-BERT achieves state-of-the-art performance in predicting both lncRNAs and mRNAs for human and mouse species. Through the utilization of LoRA-BERT, we acquire valuable insights into the traits of lncRNAs and mRNAs, offering the potential to aid in the comprehension and detection of diseases linked to lncRNAs in humans.

artificial intelligence, machine learning, natural language, (16 more...)

2411.08073

Country:

North America > United States > Missouri (0.04)
North America > United States > Texas > Brazos County > College Station (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.68)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)